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Abstract 

Background: Greater epidemiologic understanding of tine relationsliips among fetal-infant mortality and its 
prognostic factors, including birthweight, could have vast public health implications. A key step toward that 
understanding is a realistic and tractable framework for analyzing birthweight distributions and fetal-infant 
mortality. The present paper is the second of a two-part series that introduces such a framework. 

Methods: We propose estimating birthweight-specific mortality within each component of a normal mixture 
model representing a birthweight distribution, the number of components having been determined from the data 
rather than fixed a priori. 

Results: We address a number of methodological issues related to our proposal, including the construction of 
confidence intervals for mortality risk at any given birthweight within a component, for odds ratios comparing 
mortality within two different components from the same population, and for odds ratios comparing mortality 
within analogous components from two different populations. As an illustration we find that, for a population of 
white singleton infants, the odds of mortality at 3000 g are an estimated 4.15 times as large in component 2 of a 
4-component normal mixture model as in component 4 (95% confidence interval, 2.04 to 8.43). We also outline an 
extension of our framework through which covariates could be probabilistically related to mixture components. 
This extension might allow the assertion of approximate correspondences between mixture components and 
identifiable subpopulations. 

Conclusions: The framework developed in this paper does not require infants from compromised pregnancies to 
share a common birthweight-specific mortality curve, much less assume the existence of an interval of 
birthweights over which all infants have the same curve. Hence, the present framework can reveal heterogeneity in 
mortality that is undetectable via a contaminated normal model or a 2-component normal mixture model. 



Background 

A recent report shows a slight decline in the rate of 
infants with low birthweights (less than 2500 g) in the 
United States, with a rate of 8.2 percent in 2007 com- 
pared to 8.3 percent in 2006 [1]. While the rate for 
extremely low (ELBW; <1000 g) and very low birth- 
weights (VLBW; 1000-1500 g) was unchanged at 1.5 
percent, the rate for moderately low birthweights 
(MLBW; 1500-2500 g) declined from 6.8 to 6.7 percent 
[1]. Data on the proportions of normal (NEW; 2500- 
4000 g) and high birthweights (HBW; >4000 g) were not 
provided. If confirmed in the final vital records data, the 
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decline in the low birthweight rate will be the first in 
many years. National Center for Health Statistics 
(NCHS) records indicate that low birthweight rates have 
been rising since 1984, when the rate was 6.7 percent 
[1]. 

Perinatal epidemiologists have long recognized birth- 
weight as one of several factors related to fetal growth, 
and ultimately, infant survival and development [2-4]. 
However, categories such as ELBW and VLBW, while 
useful for descriptive purposes, are not completely satis- 
factory for representing the birthweight distribution of a 
population, much less assessing the relationship between 
birthweight and fetal-infant mortality. First, cutoffs such 
as 1500 g and 2500 g are arbitrary and introduce an 
artificial discreteness to a naturally continuous phenom- 
enon: presumably fetal-infant mortality risk decreases 
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only incrementally as one moves from, for example, 
2499 g to 2501 g. Second, there may still be heterogene- 
ity at any fixed birthweight: some infants born at, say, 
2499 g may be at higher risk than other infants born at 
2499 g. 

The preceding considerations motivate a new frame- 
work for modeling birthweight distributions and fetal- 
infant mortality. This is the second paper in a two-part 
series that introduces such a framework. In the first 
paper, we proposed a normal mixture model for birth- 
weight distribution: 

k 

^pjf{x;fij,c7j), (1) 

where k is the number of components, x is birth- 
weight, pj is the fraction of births in component /, fij is 
the mean of the birthweights in component /, a,- is the 
standard deviation of the birthweights in component /, 
and / {x; ftp (jj) is the probability density for a normal 
distribution with mean Hj and standard deviation O). 
What distinguished our proposal from the contaminated 
normal model of Umbach and Wilcox [5] and the 2- 
component normal mixture model of Gage and Ther- 
riault [6] was that the number of components was not 
fixed a priori but rather determined from the data using 
the Flexible Information Criterion (FLIC) (Pilla and 
Charnigo, Consistent estimation and model selection in 
semiparametric mixtures, submitted). We also showed 
how to construct confidence intervals for pj, fij, and (Tj 
(1 <= /■ <= k) based on multiple samples from the same 
population, even if those samples overlapped. 

Here we consider estimating birthweight-specific mor- 
tality curves within each component of the normal mix- 
ture model in Equation (1). We begin by generalizing 
Gage's parametric mixtures of logistic regressions 
(PMLR) technique [7] to accommodate a normal mix- 
ture model with more than two components. We pro- 
ceed to show how confidence bounds can be 
constructed for birthweight-specific mortality curves. 
We then provide formulas for estimating mortality odds 
ratios comparing populations on the same component, 
such as 

odds of mortality at 2500 g in component 3 (white 
heavy smoking population) divided by 

odds of mortality at 2500 g in component 3 (white 
general population), 

or comparing components in the same population, 
such as 

odds of mortality at 1000 g in component 2 (white 
heavy smoking population) divided by 

odds of mortality at 1000 g in component 1 (white 
heavy smoking population). 



Being able to estimate the latter kind of odds ratio - in 
other words, being able to assert that some infants in a 
population are at higher risk than others, even when 
they are of the same birthweight - is the main advantage 
of modeling a birthweight distribution as we have pro- 
posed, rather than using a contaminated normal model 
[5] or a 2-component normal mixture model [6]. Thus, 
our two-part series provides a modeling framework 
through which heterogeneity in mortality can be 
revealed that might otherwise remain undetected. 

Results 

1 . Mortality risk estimation from a single sample 
a. Description of the metfiodology 

Gage developed a parametric mixtures of logistic regres- 
sions (PMLR) technique to estimate mortality risk as a 
function of birthweight within each of two components 
in a normal mixture model describing a birthweight dis- 
tribution [7]. Although PMLR was formulated for a 2- 
component model, we generalize it to k components as 
follows. 

The risk function or birthweight-specific mortality 
curve for component / (1 <= / <= A: ) is 

Tjix) = logit-'[Pj(x)] = exp[pj.(x)] / (1 + exp[Pj{x)]), (2) 

where x represents birthweight and pj(x) is a polyno- 
mial whose coefficients must be estimated. By the law of 
total probability [8], the risk function for the population 
overall is 

k Ik 

^ rj{x)pjf{x; Hj, o- j) / ^ Pjf{x; Hj, a j) . (3) 

j=i / j=i 

Gage took pj{x) to be a second-degree polynomial, 
allowing the birthweight-specific mortality curves for 
each of his two components to be U-shaped [7]. How- 
ever, since our framework permits more than two com- 
ponents, we are reluctant to assume that a U-shaped 
pattern should prevail within every component. Thus, 
we take Pj{x) to be a fourth-degree polynomial, which 
accommodates up to two changes in convexity for each 
birthweight-specific mortality curve. 

Since estimates of Pj, /^j, and Oj (1 <= /' <= k) are 
required to calculate the Flexible Information Criterion 
(FLIC) (Pilla and Charnigo, Consistent estimation and 
model selection in semiparametric mixtures, submitted) 
when determining the number of components, we may 
assume that these estimates are now available. We then 
employ the optimization (optim) procedure in version 
2.3.1 of R (R Foundation for Statistical Computing, 
Vienna, Austria, 2006) to estimate ri{x) through r/^{x) by 
maximum likelihood conditional on the estimates of pj, 
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fij, and Oj (1 <= / <= k). Thus, PMLR represents the sec- 
ond half of a two-stage procedure for modeling birth- 
weight distribution and fetal-infant mortality. Our R 
code is available upon written request to the corre- 
sponding author. Section I of [Additional file 1] provides 
details on initial value specification for PMLR. 

b. An illustrative example 

We continue the example from Section 2a of Results 
from the previous paper, involving a data set of size 
50,000 from the NCHS Public-Use Perinatal Mortality 
Data. This data set was a random sample from the 
population of 202,849 white singletons who were born 
(or experienced fetal death) from 2000 to 2002 and 
whose mothers smoked heavily (at least twenty cigar- 
ettes per day). Equation (5) in our previous paper shows 
the estimates of pj, fij, and O) (1 <= /' <= 4) from the 
FLIC-selected 4-component model. Using these esti- 
mates, we employed PMLR as described above to esti- 
mate ri{x) through r^ix). 

Figure la shows the results. The vertical axis is loga- 
rithmic. Estimated birthweight-specific mortality curves 
for the 4 components are in dashes. We suppress the 
portions of each curve for birthweights more than three 
component standard deviations away from the compo- 
nent mean. The model-implied mortality curve, a 
weighted average of estimated birthweight-specific mor- 
tality curves for the four components, is in solid. This 
curve estimates the population risk function, whose 
form under the 4-component model is given by Equa- 
tion (3) with k = 4. Empirical mortality, defined here to 
consist of the crude mortality rates in 100 g bins, is 
depicted with circles. Crude rates of zero due to extre- 
mely small denominators are displayed near the bottom 
of Figure la. 

Birthweight-specific mortality appears roughly U- 
shaped within component 3. The patterns for the other 
components are decreasing rather than U-shaped, 
although the decrease for component 2 plateaus in the 
HBW range. The decrease for component 4 actually 
becomes steeper in the HBW range, but this seems to 
be an artifact: the proportion of births in component 4 
is small, and there are rather few deaths at large birth- 
weights, so estimating birthweight-specific mortality 
within component 4 at large birthweights is difficult. 
The model-implied mortality curve tracks empirical 
mortality very closely when the denominators for the 
crude rates are not too small. 

c. Results from competing models for birthweight 
distribution 

We used the same data set to estimate birthweight-speci- 
fic mortality curves for the lower residual and predomi- 
nant distributions in a contaminated normal model 
(Figure lb) [5]. Since the estimated proportion of births 
in the upper residual distribution was less than 1 in 8700, 



we did not attempt to estimate a birthweight-specific 
mortality curve for the upper residual distribution. 

The model-implied mortality curve generally appears 
reasonable, although there is an artifact at the threshold 
of 1700 g, where the lower residual distribution termi- 
nates. The contaminated normal model asserts that all 
infants at any fixed birthweight greater than 1700 g (and 
less than 5300 g, if one considers the upper residual dis- 
tribution) have the same mortality risk. Moreover, since 
the predominant distribution is virtually nonexistent in 
the VLBW and ELBW ranges, the contaminated normal 
model cannot detect heterogeneity in mortality risk at 
any fixed birthweight in the VLBW and ELBW ranges. 

We also estimated birthweight-specific mortality curves 
for the primary and secondary distributions in a 2-com- 
ponent normal mixture (Figure Ic) [6]. The model- 
implied mortality curve appears reasonable except for the 
pronounced downturn at 5100 g, which is an artifact of 
the extremely small denominators above 5000 g. 

The 2-component normal mixture can detect hetero- 
geneity in the NBW range and parts of the MLBW and 
HBW ranges. However, since the primary distribution is 
virtually nonexistent in the VLBW and ELBW ranges, 
the 2-component normal mixture cannot detect hetero- 
geneity in those ranges. 

2. Mortality risk estimation from multiple samples 

a. Confidence bounds 

To quantify uncertainty in the estimation of birth- 
weight-specific mortality, we proceed as follows. First, 
we draw N^ep samples from the population of interest, 
where each sample consists of birthweight/mortality 
outcome pairs. Second, we fit a A:-component normal 
mixture model to the birthweight data in each sample. 
Third, we apply PMLR to the birthweight and mortality 
outcome data in each sample, which yields estimated 
birthweight-specific mortality curves for that sample. 
Fourth, we use the N^ep sets of estimated birthweight- 
specific mortality curves to create overall estimates of 
the risk functions and accompanying confidence bounds, 
as described below. 

Let f;;i(x),fj;2(x),...,fj;N,^(x) denote the estimated 
birthweight-specific mortality curves for component 
j (1 <= J <= k ) originating from the Nrep samples. An 
overall estimate of the risk function for component / is 

Nrep 

rj{x) = logit-^N;^ X logit{f (4) 

The rationale for using the logit transformation in 
Equation (4), as well as in the elements entering Equa- 
tions (5) and (6) below, is described in Section II of 
[Additional file 1]. 
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(a) Estimated mortality in 4-component model 
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(b) Estimated mortality in contaminated model 



Lower Residual (2.5%) 

- • Predominant (97.5%) 
— ■ Impiied 
Empiricai 



— I 1 

1000 2000 



"T" 



Birthweight (grams) 



(c) Estimated mortality in 2-component model (d) Confidence bounds for mortality in 4-component model 
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Figure 1 Mortality for White Singleton Infants with Heavily Smoking Mothers (a) Estimated birthweight-specific mortality curves are 
presented for each component of a 4-component normal mixture model, along with model-implied mortality (a superposition of the estimated 
birthweight-specific mortality curves) and empirical mortality (crude rates in 100 g bins). The results are based on a single sample of size 50,000 
from the population of white singletons born to heavily smoking mothers, (b) and (c) Corresponding results are displayed for a contaminated 
normal model and a 2-component normal mixture model, (d) Estimated birthweight specific-mortality curves are presented for each component 
of a 4-component normal mixture model, along with confidence bounds determined by Equations (6) and (7) with Co = 4.0 and 4' = .2465 
based on 25 samples of size 50,000. 



Fixing X = Xq, we set 0 = logit{ rj{xo)} and define 

01,02,... ,0N,.„p as logitif l0git{f:..2(;c^)},...,l0git 

JrjN {xq)^-^^^^ 0 ^'^'^ Se 'Is'^otirig the "meta-sam- 
ple" mean and standard deviation of Qi,Q2,...,6 n > 
we construct a confidence interval for 6 via either 

e+ox-CSel ^n;^ (5) 

or, preferably, 

e+oi-{Be+CSel ^iN^p}, (6) 

where is a bias adjustment and C is a constant 
chosen so that the confidence interval has the desired 
coverage probability (typically 95%). 



If Equation (6) is used, we obtain by simulation. 
More specifically, we randomly generate birthweights 

fe 

from where pj,/ij,ai,...,P;,,/i,,,afe 

are the overall estimates of their respective parameters; 
see Section 2c of Results from the previous paper. Then 

we use r\(x),r2{x), ,rk{x) fro^" Equation (4) to ran- 
domly generate corresponding mortality outcomes. This 
yields a simulated data set consisting of birthweight/ 
mortality outcome pairs. Fitting a /c-component normal 
mixture model to the simulated birthweight data and 
then applying PMLR to the simulated birthweight and 

mortality outcome data, we obtain an "estimate" of q . 
We create four more simulated data sets in the same 



Charnigo et al. BMC Pregnancy and Childbirth 2010, 10:44 
httpy/www.biomedcentral.com/1 471 -2393/1 0/44 



Page 5 of 10 



manner, recover the value of x for each, and then 

" sim 

define as the average value of , ^ a i over the 

^0 ^ \6sim-d\ 

five simulated data sets. 

A confidence interval for rj{xo) is obtained by applying 
the inverse logit transformation to the confidence inter- 
val for 6. The above computations can be performed 
simultaneously at a series of birthweights. Connecting 
the resulting series of upper confidence limits produces 
an upper confidence bound for the risk function in 
component j, while connecting the resulting series of 
lower confidence limits produces a lower confidence 
bound. 

As when constructing confidence intervals for pp fij, 
and O) (1 <= /■ <= k), we can accommodate overlap in 
the Nrep samples by choosing the value of C in Equation 
(6) according to the fraction of the underlying popula- 
tion that each of the Nrep samples constitutes. Let Q 
denote the value of C that would be chosen if this frac- 
tion were negligibly small, and let denote the value 
that would be chosen if this fraction were equal to (p, a 
positive number less than 1. In the previous paper, we 
established the relationship 

= Co^<pN,^/{l-(l-<pf^n- (7) 
b. Illustrative example 

We continue the example from Section 2c of Results 
from the previous paper, involving N^ep = 25 data 
sets of size 50,000 from the NCHS Public-Use Peri- 
natal Mortality Data. These data sets were random 



samples from the aforementioned population of 
202,849 white singletons whose mothers smoked 
heavily. 

Figure Id displays overall estimates and confidence 
bounds for the birthweight-specific mortality curves in a 
4-component model for the birthweights of white single- 
tons born to heavily-smoking mothers. We took Q = 
4.0 (see Section 3a of Results) and ^ = .2465 = 50,000/ 
202,849. Table 1 presents numerical results at selected 
birthweights; the odds ratios are estimated as described 
in Section 2c of Results. 

Figure Id reveals considerable uncertainty in estimat- 
ing birthweight-specific mortality, especially in the HBW 
range. However, the confidence bounds for components 
2 and 4 have no overlap in the lower part of the NBW 
range, indicating heterogeneity in mortality risk despite 
the uncertainty in estimation. That the confidence 
bounds are so wide is partly due to the large (f), which 
in turn is a consequence of the small population. Sec- 
tion 3b of Results will present another example in 
which the population is considerably larger and <p is 
much smaller, 
c. Estimating odds ratios 

To estimate an odds ratio comparing components in the 
same population, such as 

odds of mortality at 1000 g in component 2 (white 
heavy smoking population) divided by 

odds of mortality at 1000 g in component 1 (white 
heavy smoking population), 

we apply Equation (6) with the following modifica- 
tions. Instead of identifying 9 with logit{ rjixo) }, we take 
e = logit{ rjiixo) } - logit{ r,-2(^o) }, where 1 <= /i, /a <= 
k. Then exp{(9} equals the mortality odds in component 
/'i at birthweight Xq divided by the mortality odds in 



Table 1 Mortality for White Singleton Infants with Heavily Smoking Mothers 



Quantity 


@ 1000 g 


@ 2000 g 


@ 3000 g 


@ 4000 g 


Risk in component 1: 
logit'H^ } [point estimate] 
Confidence interval 


110.1 (23.2, 
392.2) 








Risk in component 2: logit ^{^^! [point estimatel Co"fide"(:e interva 


460.3 (138,3, 


35.9 (16.9, 

,''1,/,: 


16,2 (8,2, 31,5) 


7,6 (2,3, 25,0) 


Risk in component 3: logit'H^ } [point estimate] Confidence interval 




41.3 (6.1, 
232.1) 


4.0 (0.7, 20.8) 


24 (0.2, 29.6) 


Risk in component 4: logit'H q } [point estimate] Confidence interval 






4.7 (3.0, 7.2) 


2.8 (0.3, 28.3) 


Odds ratio, component 1 vs. component 2: exp{g } [point estimate] Confidence 
interval 


0.15 (0.01, 2.46) 








Odds ratio, component 2 vs. component 3: exp{g } [point estimate] Confidence 

interval 




0.87 (0.06, 
12.7) 


4.13 (041, 

42,0) 


3.19 (0.09, 

115) 


Odds ratio, component 2 vs. component 4: exp{^ } [point estimate] Confidence 
interva 






351 (1,44, 
8,56) 


2,74 (0,14, 
53,1) 



Odds ratio, component 3 vs. component 4: exp{n } [point estimate] Confidence — — 0.85 (0.19, 0.86 {0.04, 

interval 3.79) 21.1) 



Mortality risl<s and odds ratios are estimated at selected birthweights, based on 25 samples of size 50,000 from the population of white singletons born to 
heavily smoking mothers. Confidence intervals are constructed using Equations (6) and (7) with Co = 4.0 and 0 = .2465. 
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component J2- Hence, exp{ ^ } is an estimate of the odds 
ratio, and 

eKp{e+oT-{Be+CSe/ («) 
is a confidence interval. 

To estimate an odds ratio comparing populations on 
the same component, such as 

odds of mortality at 2500 g in component 3 (white 
heavy smoking population) divided by 

odds of mortality at 2500 g in component 3 (white 
general population), 

we define (9i = logit{r,(xo)} for the first population and 
02 = logit{r,(:'Co)} for the second population. Then exp{^i - 
^2} equals the mortality odds in component 7 of the first 
population at birthweight Xq divided by the mortality 
odds in component y of the second population. Hence, 
exp{0i- 62} ^ estimate of the odds ratio, and 



exp{ei- 02+ or - {Be, + Be^+ C^jseJ N,^ + Se J N rep}} 

is a confidence interval. Subscripts 1 and 2 in Equa- 
tion (9) identify the populations to which the "meta- 
sample" means, standard deviations, and bias adjust- 
ments pertain. 

3. Further illustrations 

a. Simulation study to calibrate confidence intervals 

We simulated 25 overlapping data sets of size 50,000, 
the degree of overlap consistent with a population of 
200,000, based on the specifications in Table 2. The 
mixture density and the risk functions were chosen to 
mimic the patterns actually observed for white single- 
tons born to heavily-smoking mothers; see panel b of 
Figure One from the previous paper and Figure Id of 
the present paper. For each of various C between 2.0 
and 5.0, we used Equation (6) to form confidence inter- 
vals for mortality risks at selected birthweights, namely 
niMi - CTi). ^i((Mi). '■i(/<i + <7i), r2(^2 - 0'2), r2(^2). y^iji^ + 
(T2), r3(^3 - (Ts), r3(^3), r3(^3 + (T3), r4(^4 - (74), r4(^4), 
and ^4(^4 + O/^. Above, Hj and a, denote the mean and 
standard deviation of the birthweights in component 



/ (1 <= / <= 4). This was repeated nine more times, and 
we tabulated how many of the 120 = 12 x 10 confidence 
intervals contained their targets. Confidence intervals 
were also formed using Equation (5) for comparative 
purposes. The above steps were repeated with overlap- 
ping data sets consistent with a population of 1,000,000 
and with nonoverlapping data sets consistent with an 
effectively infinite population. 

The results are summarized in Table 3. With an effec- 
tively infinite population, only 75.0% of the confidence 
intervals formed using Equation (5) contained their tar- 
gets at C = 5.0. On the other hand, the confidence inter- 
vals formed using Equation (6) contained their targets 
95.0% of the time at C = 4.0. The latter finding provided 
the rationale for taking Q = 4.0 when constructing con- 
fidence intervals for mortality risks in our examples 
with real data. 

b. Another example with real data 

We drew Nrep = 25 samples of size 50,000 from the 
population of 9,162,303 white singletons born from 
2000 to 2002, without regard to maternal smoking sta- 
tus. Figure 2a and Table 4 present overall estimates and 
confidence intervals for parameters in a 4-component 
normal mixture, while Figure 2b and Table 5 pertain to 
mortality risks. Confidence intervals for mixture para- 
meters are based on Equations (7) and (8) in the pre- 
vious paper with <j) = .0055 = 50,000/9,162,303 and Co = 
2.5. Confidence intervals for mortality risks are based on 
Equations (6) and (7) in the present paper with 0 = 
.0055 and Co = 4.0. 

The confidence intervals for p2, Ps, ^2> are consid- 
erably narrower than they were for white singletons 
born to heavily-smoking mothers, as are the confidence 
bounds for ri{x) in the VLBW range, r2{x) in the 
MLBW range, and rs{x) in much of the NBW range. 
The confidence bounds for r2ix) do not overlap those 
for rsix) or r4(x) anywhere in the NBW range, indicating 
heterogeneity in mortality risk. In particular, the odds of 
mortality at 3000 g are an estimated 9.77 times as large 
in component 2 as in component 3 (95% confidence 
interval, 2.35 to 40.6) and an estimated 4.15 times as 
large in component 2 as in component 4 (95% confi- 
dence interval, 2.04 to 8.43). 



Table 2 Mixture Model and Mortality Functions for Simulation Study 



Model feature 


Specification for simulation study 


Probability density for mixture model 


.007 f(x;832,210) +.182 f(x;2772,740) +.758 f(x;31 70,41 7) +.052 f(x;3804,41 3) 


Risl< within component 1 


rM = logit "\-4.5975 -0.2362 z + 03994 + 0.1690 + 0.1328 z"*) 


Risk within component 2 


f-2(x) = logit "\-4.0962 -0.7496 z - 0.0289 z^ - 0.1094 z^ + 0.0918 z'') 


Risk within component 3 


f3(x) = logit "\-5.7538 -1.7275 z + 1.6269 z^ + 0.1897 z' - 0.0249 


Risk within component 4 


r4{x) = logit '\-5.3285 -0.2786 z - 0.1979 / + 0.0535 / + 0.0773 z*) 



The probability density for tlie mixture model used in our simulation study is specified, as are the mortality risk functions associated with the mixture model 
components. Above, z is defined as (x - 3000)/1000, where x is birthweight in grams. 
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Table 3 Confidence Interval Coverage Probabilities in Simulation Study 



c 


Population Size 


Bias adjustment included 

Number & Percentage of Intervals Containing 
Targets (mortality risks) 


Bias adjustment omitted 

Number & Percentage of Intervals Containing 
Targets (mortality risl«) 


2.0 


2UU,0UU 


69 (57.5) 


26 (21 .7) 






92 i/t.,'} 






Infinite 


92 (76.7) 


43 (35.8) 


2.5 


"lAO AAA 

200,000 


78 (65.0) 


29 (24.2) 




1 AAA AAA 

1 ,000,000 


100 (83.3) 


37 (30.8) 




Infinite 


96 (80.0) 


47 (39.2) 


3.0 


'^AA AAA 

200,000 


o A f~ir\ A\ 

84 (70.0) 


32 (26.7) 




1 AAA AAA 

1 ,000,000 


1 AiC loo 0\ 

106 (88.3) 


44 (36.7) 




infinite 


1 AA lOC A\ 

102 (85.0) 


56 (46.7) 


3.5 


1AA AAA 

200,000 


OA f~I A A\ 

89 (/4.2) 


35 (29.2) 




1 AAA AAA 

1 ,000,000 


111 /'A A r\ 

1 1 1 (92.5) 


r A /AT A\ 

54 (45.0) 




Infinite 


1 1 A /A 1 ~7\ 

1 10 (91.7) 


/'A / r r\ A\ 

60 (50.0) 


4.0 


AAA AAA 

200,000 


93 (77.5) 


44 (36.7) 




1 nnn ooo 
1 ,uuu,uuu 


1 1 ft fQft 71 






Infinite 


114 (95.0) 


65 (54.2) 


4.5 


200,000 


97 (80.8) 


48 (40.0) 












Infinite 


115 (95.8) 


76 (63.3) 


5.0 


200,000 


102 (85.0) 


57 (47.5) 




1,000,000 


1 1 7 (97.5) 


79 (65.8) 




■--rin'tc 







The row with "C" = 2 and "Population size" = 200A)00 identifies the numbers and percentages of confidence intervals containing their targets of mortality risks at 
selected birthweights (three for each of four mixture components), based on 1 0 repetitions in each of which 25 samples of size 50,000 were simulated from a 4- 
component normal mixture. Results under the heading of "Bias adjustment included" are based on Equation (6) with C = 2. Results under the heading of "Bias 
adjustment omitted" are based on Equation (5) with C = 2. The 25 samples of size 50,000 had overlap consistent with a population size of 200,000. Other rows 
correspond to different choices of C and/or population sizes. 



The reason that mixture parameters and mortality 
risks are estimated more precisely for white singletons 
in general than for white singletons born to heavily- 
smoking mothers is that Nrep = 25 samples of size 
50,000 from a population of 9,162,303 contain approxi- 
mately 1,171,467 distinct records, far more than 
the approximately 202,677 distinct records contained in 
Nrep = 25 samples of size 50,000 from a population of 
202,849. (Section II of [Additional file 1] from our pre- 
vious paper provides a formula from which one may 
approximate the number of distinct records in multiple 
samples from the same population.) Even more precise 
estimation is possible for white singletons in general if 
Nrep is taken larger. 

Discussion 

This paper completes a two-part series on a new frame- 
work for modeling birthweight distributions and fetal- 
infant mortality. The main advantage of this new frame- 
work is its potential to reveal heterogeneity in mortality 
risk that may be undetectable if one relies on a 



contaminated normal model or 2-component normal 
mixture to represent a birthweight distribution. 

With the contaminated normal model, the lower resi- 
dual distribution and the predominant distribution have 
little overlap. As such, there is little overlap in the ranges 
of birthweights over which each component has a well- 
defined risk function. This is depicted in Figure lb, 
where the red and green dashed curves do not occupy 
the same birthweights except for a small interval near 
1700 g. Thus, except for birthweights close to 1700 g, the 
contaminated normal model effectively imposes a unique 
mortality risk for all infants at any fixed birthweight. This 
occurs because the contaminated normal model classifies 
all NBW cases, along with almost all MLBW and HBW 
cases, as originating from the predominant distribution, 
while it classifies virtually all VLBW and ELBW births as 
arising from the lower residual distribution. Yet, presum- 
ably some compromised pregnancies yield MLBW, 
NBW, and HBW births. Hence, not only does the esti- 
mated proportion .975 overstate the fraction of uncom- 
promised pregnancies, but also no distinction can be 
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(a) Fitted 4-component model for white general 
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(b) Estimated mortality for white general 
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Figure 2 Mixture Modeling Results and Mortality for White Singleton Infants (a) A 4-component normal mixture model, with parameters 
estimated by combining the results for 25 samples of size 50,000 from the population of white singletons in general, is shown, (b) Estimated 
birthweight specific-mortality curves are presented for each component of a 4-component normal mixture model, along with confidence 
bounds determined by Equations (6) and (7) with Co = 4.0 and 0 = .0055 based on 25 samples of size 50,000 from the population of white 
singletons in general. 



made between compromised and uncompromised preg- 
nancies at birthweights above 1700 g. 

In contrast, the 2-component normal mixture has some 
abiUty to reveal heterogeneity. However, this ability is 
limited to the MLBW, NBW, and HBW ranges. As 
shown in Figure Ic, the 2-component normal mixture 
effectively imposes a unique mortality risk at each birth- 
weight in the VLBW and ELBW ranges. At first glance, 
that may not seem worrisome. After all, the MLBW, 
NBW, and HBW cases may arise from a mix of compro- 
mised and uncompromised pregnancies, while presum- 
ably the VLBW and ELBW cases arise almost exclusively 
from compromised pregnancies. Yet, implicit in the 2- 
component normal mixture is a belief that all compro- 
mised pregnancies are qualitatively similar, in the sense 
of sharing a common birthweight-specific mortality 
curve. Perhaps such a belief is approximately valid for 
some populations. Unfortunately, the 2-component nor- 
mal mixture imposes this belief mathematically and does 
not provide any way for it to be tested empirically. The 
framework that we have presented, on the other hand. 



allows such a belief to be tested empirically. Indeed, the 
example in Section 3b of Results shows that component 
2 in the population of white singletons has demonstrably 
higher mortality risk at some birthweights than compo- 
nent 4 in the same population. We regard component 3 
as most plausibly representing uncompromised pregnan- 
cies in this population, so that components 2 and 4 most 
plausibly consist of compromised pregnancies. Therefore, 
not all compromised pregnancies in this population share 
a common birthweight-specific mortality curve. 

The components identified in our empirical explorations 
are undoubtedly related to gestational age. While detailed 
speculations about the precise nature of the relationship 
are premature, one or more of the components may have 
an elevated rate of intrauterine growth restriction (lUGR). 
Typically, lUGR is measured in population-based vital sta- 
tistics data as births below (variously) the 5th or 10th per- 
centile of birthweight for gestational age. Other aspects 
not presently measured on birth certificates in the United 
States include head circumference at birth, birth length 
(i.e., crown-heel length or crown-rump length), and 
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Table 4 Mixture Modeling Results for White Singleton Infants 



Quantity 


Pi 


P2 


P3 


P4 


Q [average of 25 estimates] 


.005 


.117 


.810 


.068 


c !Si;:ncl;i'c! cIcMafon ot estimates 
So 


,001 


.0"0 


.012 


OlO) 


[bias adjustment] 


.001 


.012 


.011 


.007 


Confidence interval 


(.004, .006) 


(.099, .135) 


(.792, .827) 


(.058, .078) 


Quantity 


|Ji 


[ii 


Ms 


M4 


Q [average of 25 estimates] 


862 


2948 


3402 


4056 


c Istenda-d cleMafon ot estimates 
So 


60 






18 


[bias adjustment] 


22 


47 


4 


36 


Confidence interval 


(809, 915) 


(2874, 3021) 


(3395, 3410) 


(4011, 4100) 


Quantity 


Oi 






04 


[average of 25 estimates] 


233 


776 


421 


416 


c istendsi'c! des'iefon ot 25 estimates 
So 


■-^0 






l« 


[bias adjustment] 


42 


25 


5 


11 


Confidence interva 


(1 70, 295) 


(739, 813) 


(41 3, 429) 


(395, 437) 



Parameters in a 4-component normal mixture model for birthweight distribution are estimated, based on 25 samples of size 50,000 from the population of white 
singletons in general. Confidence intervals are constructed using Equations (6) and (7) with Q, = 2.5 and 0 = .0055. 



waist/hip ratio. However lUGR might be quantified, its 
frequency within each component could be estimated as 
indicated in the next paragraph. 

A useful extension of our methodology would entail 
probabilistically relating a covariate of interest, such as 
gestational age or lUGR, to the mixture components. 
Suppose that the covariate of interest were dichotomous. 
For gestational age, we could create a dichotomy by 
labeling infants as "preterm" or "term". Then, given a 
fitted /r-component mixture model for birthweight dis- 
tribution, we could apply PMLR with dichotomized 
gestational age or lUGR rather than mortality as the 
dependent variable. The resulting rj{x) would denote 



Table 5 Mortality for White Singleton Infants 



Quantity 


@ 1000 g 


@ 2000 g 


@ 3000 g 


@ 4000 g 


Risl< in component 1: logif^{^ } [point estimate] Confidence interval 


124.3 (71.1, 
208.4) 








Risl< in component 2: logit"'{^ } [point estimate] Confidence interval 


242.8 (34.3, 
743.1) 


52.1 (41.3, 
65.6) 


17.0 (9.0, 31.9) 


12.1 (6.6, 22.2) 


Risl< in component 3: logit"'{^ ) [point estimate] Confidence interval 






1.8 (0.8, 4.1) 


0.3 (0.02, 3.9) 


Risl< in component 4: logif'{^ } [point estimate] Confidence interval 






4.2 (3.1, 5.7) 


1.2 (0.4, 3.8) 


Qdds ratio, component 1 vs. component 2: exp{ q } [point estimate] Confidence 
interval 


0.44 (0.03, 6.90) 








Qdds ratio, component 2 vs. component 3: exp( A } [point estimate] Confidence 
interval 






9.77 (2.35, 
40.6) 


44.3 (2.55, 
768) 


Qdds ratio, component 2 vs. component 4: expfA } [point estimate] Confidence 
interval 






4.15 (2.04, 
8.43) 


10.4 (3.24, 
33.6) 


Odds ratio, component 3 vs. component 4: exp(g } [point estimate] Confidence 
interval 






0.42 (0.18, 
1.01) 


0.24 (0.01, 
5.34) 



Mortality risks and odds ratios are estimated at selected birthweights, based on 25 samples of size 50,000 from the population of white singletons in general. 
Confidence intervals are constructed using Equations (6) and (7) with Co = 4.0 and <l> = .0055. 



not the estimated mortality risk but rather the estimated 
probability of a preterm birth or of lUGR as a function 
of birthweight within component (1 <= /' <= k). To 
estimate the overall probability of a preterm birth or of 
lUGR within component 7, we would integrate rj{x) 
over the estimated distribution of birthweights within 
component J, 

jfj{x)f{x;i!ij,aj)dx. (10) 

Pursuing this idea and extending it to multiple covari- 
ates, both categorical and continuous, would enable us 
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to describe the joint distribution of covariates within 
each mixture component. If the joint distributions of 
covariates within different mixture components had ht- 
tle overlap, then we would be able to assert an approxi- 
mate correspondence between the mixture components 
and identifiable subpopulations with biological meaning. 
Such discoveries would provide greater epidemiologic 
insight into the relationships among fetal-infant mortal- 
ity and its prognostic factors. 

Conclusions 

The present paper, the second in a two-part series, 
develops a new and flexible approach to modeling fetal- 
infant mortality through the estimation of separate 
birthweight-specific mortality curves within each com- 
ponent of a normal mixture model describing a birth- 
weight distribution, the number of components having 
been determined from the data rather than fixed a 
priori. This approach allows the detection of heteroge- 
neity in mortality that cannot be found with a contami- 
nated normal model or a 2-component normal mixture 
model. A 2-component normal mixture model assumes 
that infants from compromised pregnancies share a 
common birthweight-specific mortality curve, while a 
contaminated normal model assumes that all infants 
share a common curve over some (possibly quite large) 
interval of birthweights. Yet, our approach has demon- 
strated that components 2 and 4 in a 4-component nor- 
mal mixture model for white singleton birthweights 
have distinct birthweight-specific mortality curves. Since 
components 2 and 4 in this population most plausibly 
consist of compromised pregnancies, we see that infants 
from compromised pregnancies need not share a com- 
mon birthweight-specific mortality curve. Finally, this 
paper lays some groundwork for future research aimed 
at discovering approximate correspondences between 
mixture model components and identifiable 
subpopulations. 

Methods 

[Additional file 1] presents technical details on our 
methodology and its implementation. 

Additional material 



Additional file 1: Technical Appendix. Additional file 1 presents 
technical details on our methodology and its implementation. 
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